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TITLE OF THE INVENTION 

GENES OF CAROTENOID BIOSYNTHESIS AND METABOLISM 
AND A SYSTEM FOR SCREENING FOR SUCH GENES 

BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention describes the DNA sequence for 
eukaryotic genes encoding e cyclase, isopentenyl pyrophosphate 
isomerase (IPP) and /3-carotene hydroxylase as well as vectors 
containing the same and hosts transformed with said vectors. 
The present invention also provides a method for augmenting 
the accumulation of carotenoids and production of novel and 
rare carotenoids. The present invention provides methods for 
controlling the ratio of various carotenoids in a host. 
Additionally, the present invention provides a method for 
screening for eukaryotic genes encoding enzymes of carotenoid 
biosynthesis and metabolism. 

Discussion of the Background 

Carotenoid pigments with cyclic endgroups are essential 
components of the photosynthetic apparatus in oxygenic 
photosynthetic organisms (e.g., cyanobacter ia , algae and 
plants; Goodwin, 1980). The symmetrical bicyclic yellow 
carotenoid pigment /8-carotene (or, in rare cases, the 
asymmetrical bicyclic a-carotene) is intimately associated 
with the photosynthetic reaction centers and plays a vital 
* role in protecting against potentially lethal photooxidative 
damage (Koyama, 1991) . ^-carotene and other carotenoids 
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derived from it or from a-carotene also serve as light- 
harvesting pigments (Sief ermann-Harms , 1987), are involved in 
the thermal dissipation of excess light energy captured by the 
light-harvesting antenna (Demmig-Adams & Adams , 1992), provide 
substrate for the biosynthesis of the plant growth regulator 
abscisic acid (Rock & Zeevaart, 1991; Parry & Horgan, 1991), 
and are precursors of vitamin A in human and animal diets 
(Krinsky, 1987). Plants also exploit carotenoids as coloring 
agents in flowers and fruits to attract pollinators and agents 
of seed dispersal (Goodwin, 1980) . The color provided by 
carotenoids is also of agronomic value in a number of 
important crops. Carotenoids are currently harvested from 
plants for use as pigments in food and feed. 

The probable pathway for formation of cyclic carotenoids 
in plants, algae and cyanobacteria is illustrated in Figure l. 
Two types of cyclic endgroups are commonly found in higher 
plant carotenoids, these are referred to as the /? and € cyclic 
endgroups (Fig. 3.; the acyclic endgroup is referred to as the 
* or psi endgroup) . These cyclic endgroups differ only in the 
position of the double bond in the ring. Carotenoids with two 
0 rings are ubiquitous, and those with one /3 and one e ring 
are common, but carotenoids with two € rings are rarely 
detected. ^-Carotene (Fig. 1) has two 0 endgroups and is a 
symmetrical compound that is the precursor of a number of 
other important plant carotenoids such as zeaxanthin and 
violaxanthin (Fig. 2). 
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Carotenoid enzymes have previously been isolated from a 
variety of sources including bacteria (Armstrong et al., 1989, 
Mol. Gen, Genet. 216, 254-268; Misawa et al., 1990, J. 
Bacterid., 172, 6704-12), fungi (Schmidhauser et al., 1990, 
Mol. Cell. Biol. 10, 5064-70), cyanobacteria (Chamovitz et 
al., 1990, Z. Naturforsch, 45c, 482-86) and higher plants 
(Bartley et al., Proc. Natl. Acad. Sci USA 88, 6532-36; 
Martinez-Ferez & Vioque, 1992, Plant Mol. Biol. 18, 981-83). 
Many of the isolated enzymes show a great diversity in 
function and inhibitory properties between sources. For 
example, phytoene desaturases from Synechococcus and higher 
plants carry out a two-step desaturation to yield f-carotene 
as a reaction product; whereas the same enzyme from Erwinia 
introduces four double bonds forming lycopene. Similarity of 
the amino acid sequences are very low for bacterial versus 
plant enzymes. Therefore, even with a gene in hand from one 
source, it is difficult to screen for a gene with similar 
function in another source. In particular, the sequence 
similarity between prokaryotic and eukaryotic genes is quite 
low. 

Further, the mechanism of gene expression in prokaryotes 
and eukaryotes appears to differ sufficiently such that one 
can not expect that an isolated eukaryotic gene will be 
properly expressed in a prokaryotic host. 
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The difficulties in isolating related genes is 
exemplified by recent efforts to isolated the enzyme which 
catalyzes the formation of 0-carotene from the acyclic 
precursor lycopene. Although this enzyme had been isolated in 
a prokaryote, it had not been isolated from any photosynthetic 
organism nor had the corresponding genes been identified and 
sequenced or the cof actor requirements established. The 
isolation and characterization of the enzyme catalyzing 
formation of 0-carotene in the cyanobacterium Synechococcus 
PCC7942 was described by the present inventors and others 
(Cunningham et al., 1993 and 1994). 

The need remains for the isolation of eukaryotic genes 
involved in the carotenoid biosynthetic pathway, including a 
gene encoding an € cyclase, IPP isomerase and 0-carotene 
hydroxylase. There remains a need for methods to enhance the 
production of carotenoids. There also remains a need in the 
art for methods for screening for eukaryotic genes encoding 
enzymes of carotenoid biosynthesis and metabolism. 

SUMMARY OF THE INVENTION 

Accordingly, a first object of this invention is to 
provide isolated eukaryotic genes which encode enzymes 
involved in carotenoid biosynthesis; in particular, e cyclase, 
IPP isomerase and 0-carotene hydroxylase. 
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A second object of this invention is to provide 
eukaryotic genes which encode enzymes which produce novel 
carotenoids. 

A third object of the present invention is to provide 
vectors containing said genes. 

A fourth object of the present invention is to provide 
hosts transformed with said vectors. 

Another object of the present invention is to provide 
hosts which accumulates novel or rare carotenoids or which 
overexpress known carotenoids. 

Another object of the present invention is to provide 
hosts with inhibited carotenoid production. 

Another object of this invention is to secure the 
expression of eukaryotic carotenoid-related genes in a 
recombinant prokaryotic host. 

A final object of the present invention is to provide a 
method for screening for eukaryotic genes which encode enzymes 
involved in carotenoid biosynthesis and metabolism. 

These and other objects of the present invention have 
been realized by the present inventors as described below. 

BRIEF DESCRIPTION OF THE DRAWINGS 
A more complete appreciation of the invention and many of 
the attendant advantages thereof will be readily obtained as 
- the same becomes better understood by reference to the 
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following detailed description when considered in connection 
with the accompanying drawings, wherein: 

Figure 1 is a schematic representation of the pathway of 
0-carotene biosynthesis in cyanobacter ia , algae and plants. 
The enzymes catalyzing various steps are indicated at the 
left. Target sites of the bleaching herbicides NFZ and MPTA 
are also indicated at the left. Abbreviations: DMAPP, 
dimethylallyl pyrophosphate; FPP, farnesyl pyrophosphate; 
GGPP, geranylgeranyl pyrophosphate; GPP, geranyl 
pyrophosphate; IPP, isopentenyl pyrophosphate; LCY, lycopene 
cyclase; MVA, mevalonic acid; MPTA, 2- (4- 

methylphenoxy) triethylamine hydrochloride; NFZ, norflurazon; 
PDS, phytoene desaturase; PSY, phytoene synthase; ZDS, f- 
carotene desaturase; PPPP, prephytoene pyrophosphate. 

Figure 2 depicts possible routes of synthesis of cyclic 
carotenoids and common plant and algal xanthophylls 
(oxycarotenolds) from neurosporene. Demonstrated activities 
of the 0- and e- cyclase enzymes of A. thaliana are indicated 
by bold arrows labelled with 0 or e respectively. A bar below 
the arrow leading to e -carotene indicates that the enzymatic 
activity was examined but no product was detected. The steps 
marked by an arrow with a dotted line have not been 
specifically examined. Conventional numbering of the carbon 
atoms is given for neurosporene and a-carotene. Inverted 
•triangles (t) mark positions of the double bonds introduced as 
a consequence of the desaturation reactions. 
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Figure 3 depicts the carotene endgroups which are found 
in plants. 

Figure 4 is a DNA sequence and the predicted amino acid 
sequence of e cyclase isolated from A. thaliana (SEQ ID NOS: 1 
and 2). These sequences were deposited under Genbank 
accession number U50738. This cDNA is incorporated into the 
plasmid pATeps. 

Figure 5 is a DNA sequence encoding the 0-carotene 
hydroxylase isolated from A. thaliana (SEQ ID NO: 3). This 
cDNA is incorporated into the plasmid pATOHB. 

Figure 6 is an alignment of the predicted amino acid 
sequences of A. thaliana 0-carotene hydroxylase (SEQ ID NO: 4) 
with the bacterial enzymes from Alicalgenes sp. (SEQ ID NO: 5) 
(Genbank D58422) , Erwinia herbicola EholO (SEQ ID NO.: 6) 
(GenBank M872280), Erwinia uredovora (SEQ ID NO.: 7) (GenBank 
D90087) and Agrobacterium axirianticum (SEQ ID NO.: 8) (GenBank 
D58420) . A consensus sequence is also shown. Consensus is 
identical for all five genes where a capital letter appears. 
A lowercase letter indicates that three of five, including A. 
thaliana, have the identical' residue, TM; transmembrane 

Figure 7 is a DNA sequence of a cDNA encoding an IPP 
isomer ase isolated from A. thaliana (SEQ ID NO: 9) . This cDNA 
is incorporated into the plasmid pATDPS. 

Figure 8 is a DNA sequence of a second cDNA encoding 
another IPP isomerase isolated from A. thaliana (SEQ ID NO: 
10). This cDNA is incorporated into the plasmid pATDP7. 
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Figure 9 is a DNA sequence of a cDNA encoding an IPP 
isomerase isolated from Haematococcus pluvial is (SEQ ID NO: 
11) . This cDNA is incorporated into the plasmid pHP04 . 

Figure 10 is a DNA sequence of a second cDNA encoding 
another IPP isomerase isolated from Haematococcus pluvialis 
(SEQ ID NO: 12) . This cDNA is incorporated into the plasmid 
pHP05 . 

Figure 11 is an alignment of the predicted amino acid 
sequences of the IPP isomerase isolated from A. thaliana (SEQ 
ID NO.: 16 and 18), H. pluvialis (SEQ ID NOS..: 14 and 15), 
Clarkia breweri (SEQ ID NO.: 17) (See, Blanc & Pichersky, 
Plant Physiol. (1995) 108:855; Genbank accession no. X82627) 
and Saccharomyces cerevisiae (SEQ ID NO. : 19) (Genbank 

accession no. JO 5090) . 

Figure 12 is a DNA sequence of the cDNA encoding an IPP 
isomerase isolated from marigold (SEQ ID NO: 13) . This cDNA 
is incorporated into the plasmid pPMDPl . xxx's denote a 
region not yet sequenced at the time when this applicaiton was 
prepared . - - 

Figure 13 is an alignment of the consensus sequence of 4 
plant 6-cyclaees (SEQ ID NO.: 20) with the A. thaliana e- 
cyclase (SEQ ID NO. : 21) A capital letter in the plant 3 
consensus is used where all 4 B cyclase genes predict the same 
amino acid residue in this position. A small letter indicates 
chat an identical residue was found in 3 or the 4. Dashes 
indicate that the amino acid residue was not conserved and 
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dots in the sequence denote a gap. A consensus for the 
aligned sequences is given, in capital letters below the 
alignment, where the p and e cyclase have the same amino acid 
residue. Arrows indicate some of the conserved amino acids 
that will be used as junction sites for construction of 
chimeric cyclases with novel enzymatic activities. Several 
regions of interest including a sequence signature indicative 
of a dinucleotide-binding motif and 2 predicted transmembrane 
(TM) helical regions are indicated below the alignment and are 
underlined. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Isolated eukarvotic gen es which encode en2vmes involved in 
caroten oid biosynthesis 

The present inventors have now isolated eukaryotic genes 
encoding e cyclase and ]3-carotene hydroxylase from A. thaliana 
and Ipp isomerases from several sources. 

The present inventors have now isolated the eukaryotic 
gene encoding the enzyme IPP isomerase which catalyzes the 
conversion of isopentenyl pyrophosphate (IPP) to dimethylallyl 
pyrophosphate (DMAPP) . IPP isomerases were isolated from A, 
thaliana, H. pluvialis and marigold. 

Alignments of these are shown in Figure 12 (excluding the 
marigold sequence) . Plasmids containing these genes were 
deposited with the American Type Culture Collection, 12301 
Parklawn Drive, Rockville MD 20852 on March 4, 1996 under ATCC 
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accession numbers 98000 (pHP05 - H. pluvialis) ; 98001 (pMDPl - 
marigold); 98002 (pATDP7 - H. pluvialis) and 98004 (pHP04 - H. 
pluvialis) . 

The present inventors have also isolated the gene 
encoding the enzyme, e cyclase, which is responsible for the 
formation of € endgroups in carotenoids. A gene encoding an € 
cyclase from any organism has not heretofore been described. 
The A. thaliane e cyclase adds an e-ring to only one end of 
the symmetrical lycopene while the related ^-cyclase adds a 
ring at both ends. The DNA of the present invention is shown 
in Figure 4 and SEQ ID NO: l. A plasmid containing this gene 
was deposited with the American Type Culture Collection, 12301 
Parklawn Drive, Rockville MD 20852 on March 4, 1996 under ATCC 
accession number 98005 (pATeps - A. thaliana) . 

The present inventors have also isolated the gene 
encoding the enzyme, /3-carotene hydroxylase, which is 
responsible for hydroxy lating the 0 endgroup in carotenoids. 
The DNA of the present invention is shown in SEQ ID NO: 3 and 
Figure 5. The full length gene product hydroxy lates both end 
groups of /3-carotene as do products of genes which encode 
proteins truncated by up to 50 amino acids from the N- 
terminus. Products of genes which encode proteins truncated 
between about 60-110 amino acids from the N-terminus 
preferentially hydroxy lates only one ring. A plasmid 
•containing this gene was deposited with the American Type 
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Culture Collection, 12301 Parklawn Drive, Rockville MD 20852 
on March 4, 1996 under ATCC accession number 98003 (pATOHB - 
A. thaliana) . 

Eukarvotic genes which encode enzymes which produce novel or 
rare carotenoids 

The present invention also relates to novel enzymes which 
can transform known carotenoids into novel or rare products ♦ 
That is, currently e -carotene (see figure 2) and y -carotene 
can only be isolated in minor amounts. As described below, an 
enzyme can be produced which would transform lycopene to 7- 
carotene and lycopene to e -carotene. With these products in 
hand, bulk synthesis of other carotenoids derived from them 
are possible. For example, €-carotene can be hydroxylated to 
form an isomer of lutein (1 e- and 1 /3-ring) and zeaxanthin (2 
0-rings) where both endgroups are, instead, € -rings. 

The eukaryotic genes in the carotenoid biosynthetic 
pathway differ from their prokaryotic counterparts in their 5' 
region* As used herein, the 5' region is the region of 
eukaryotic DNA which precedes the initiation codon of the 
counterpart gene in prokaryotic DMA. That is, when the 
consensus areas of eukaryotic and prokaryotic genes are 
aligned, the eukaryotic genes contain additional coding 
sequences upstream of the prokaryotic initiation codon. 
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The present inventors have found that the amount of the 
5' region present can alter the activity of the eukaryotic 
enzyme. Instead of diminishing activity, truncating the 5' 
region of the eukaryotic gene results in ah enzyme with a 
different specificity. Thus, the present invention relates to 
enzymes which are truncated to within 0-50, preferably 0-25, 
codons of the 5' initiation codon of their prokaryotic 
counterparts as determined by alignment maps. 

For example, as discussed above, when the gene encoding 
A. thaliana /3-carotene hydroxylase was truncated, the 
resulting enzyme catalyzed the formation of 0-cryptoxanthin as 
major product and zeaxanthin as minor product; in contrast to 
its normal production of zeaxanthin. 

In addition to novel enzymes produced by truncating the 
5' region of known enzymes, novel enzymes which can 
participate in the formation of novel carotenoids can be 
formed by replacing portions of one gene with an analogous 
sequence from a structurally related gene. For example, 0- 
cyclase and € -cyclase are structurally related (see Figure 
13). By replacing a portion of 0-lycopene cyclase with the 
analogous portion of e -cyclase, an enzyme which produces 7- 
carotene will be produced (1 endgroup) . Further, by replacing 
a portion of the c-lycopene cyclase with the analogous portion 
of j3-cyclase, an enzyme which produces e -carotene will be 
* produced (c-cyclase normally produces a compound with 1 e- 
endgroup (^-carotene) not 2). Similarly, 0-hydroxylase could 
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be modified to produce enzymes of novel function by creation 
of hybrids with e -hydroxylase . 

Vectors 

The genes encoding the carotenoid enzymes as described 
above, when cloned into a suitable expression vector, can be 
used to over express these enzymes in a plant expression system 
or to inhibit the expression of these enzymes. For example, 
vector containing the gene encoding e -cyclase can be used to 
increase the amount of a-carotene in an organism and thereby 
alter the nutritional value, pharmacology and visual 
appearance value of the organism. 

In a preferred embodiment, the vectors of the present 
invention contain a DNA encoding an eukaryotic IPP isomerase 
upstream of a DNA encoding a second eukaryotic carotenoid 
enzyme. The inventors have discovered that inclusion of an 
IPP isomerase gene increases the supply of substrate for the 
carotenoid pathway; thereby enhancing the production of 
carotenoid endproducts. This is apparent from the much deeper 
pigmentation in carotenoid-accumulating colonies of E. coll 
which also contain one of the aforementioned IPP isomerase 
genes when compared to colonies that lack this additional IPP 
isomerase gene. Similarly, a vector comprising an IPP 
isomerase gene can be used to enhance production of any 
secondary metabolite of dimethylallyl pyrophosphate (such as 
isoprenoids , steroids , carotenoids , etc . ) . 
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Alternatively, an ant i -sense strand of one of the above 
genes can be inserted into a vector. For example, the e- 
cyclase gene can be inserted into a vector and incorporated 
into the genomic DNA of a host, thereby inhibiting the 
synthesis of e,/3 carotenoids (lutein and o-carotene) and 
enhancing the synthesis of 0,0 carotenoids (zeaxanthin and 0- 
carotene) . 

Suitable vectors according to the present invention 
comprise a eukaryotic gene encoding an enzyme involved in 
carotenoid biosynthesis or metabolism and a suitable promoter 
for the host can be constructed using techniques well known in 
the art (for example Sambrook et al., Molecular Cloning A 
Laboratory Manual , cold Spring Harbor Laboratory, Cold Spring 
Harbor, NY, 1989). 

Suitable vectors for eukaryotic expression in plants are 
described in Frey et al., Plant J. (1995) 8(5):693 and Misawa 
et al, 1994a; incorporated herein by reference. 

Suitable vectors for prokaryotic expression include 
PACYC184, pUC119, and pBR322 (available from New England 
BioLabs, Bevery, MA) and pTreHis (Invitrogen) and pET28 
(Novagene) and derivatives thereof. 

The vectors of the present invention can additionally 
contain regulatory elements such as promoters, repressors 
selectable markers such as antibiotic resistance genes, etc. 



♦ 
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Hosts 

Host systems according to the present invention can 
comprise any organism that already produces carotenoids or 
which has been genetically modified to produce carotenoids. 
The IPP isomerase genes are more broadly applicable for 
enhancing production of any product dependent on DMAPP as a 
precursor . 

Organisms which already produce carotenoids include 
plants, algae, some yeasts, fungi and cyanobacteria and other 
photosynthetic bacteria. Transformation of these hosts with 
vectors according to the present invention can be done using 
standard techniques such as those described in Misawa et al. , 

(1990) supra; Hundle et al., (1993) supra; Hundle et al., 

(1991) supra; Misawa et al., (1991) supra; Sandmann et al., 
supra; and Scnurr et al., supra; all incorporated herein by 
reference. 

Alternatively, transgenic organisms can be constructed 
which include the DNA sequences of the present invention (Bird 
et al, 1991; Bramley et al, 1992; Misawa et al, 1994a; Misawa 
et al, 1994b; Cunningham et al, 1993). The incorporation of 
these sequences can allow the controlling of carotenoid 
biosynthesis, content, or composition in the host cell. These 
transgenic systems can be constructed to incorporate sequences 
which allow over-expression of the carotenoid genes of the 
present invention. Transgenic systems can also be constructed 
containing antisense expression of the DNA sequences of the 



WO 97/36998 



PCT/US97/00540 



-16- 

present invention. Such antisense expression would result in 
the accumulation of the substrates of the substrates of the 
enzyme encoded by the sense strand. 

A method for screening for eukaryot ic genes which encode 

enzymes involved in carotenoid bios ynthesis 

The method of the present invention comprises 

transforming a prokaryotic host with a DNA which may contain a 

eukaryotic or prokaryotic carotenoid biosynthetic gene; 

culturing said transformed host to obtain colonies; and 

screening for colonies exhibiting a different color than 

colonies of the untrans formed host. 

Suitable hosts include E. coli, cyanobacteria such as 

Synechococcus and Synechocystis , alga and plant cells. E. 

coli are preferred. 

In a preferred embodiment, the above "color 
complementation test" can be enhanced by using mutants which 
are either (1) deficient in at least one carotenoid 
biosynthetic gene or (2) overexpress at least one carotenoid 
biosynthetic gene. In either case, such mutants will 
accumulate carotenoid precursors. 

Prokaryotic and eukaryotic DNA libraries can be screened 
in total for the presence of genes of carotenoid biosynthesis, 
metabolism and degradation. Preferred organisms to be 
screened include photo synthetic organisms. 



WO 97/36998 PCIYUS97/00540 

-17- 



E. coli can be transformed with these eukaryotic cDNA 
libraries using conventional methods such as those described 
in Sambrook et al, 1989 and according to protocols described 
by the venders of the cloning vectors. 

For example, the cDNA libraries in bacteriophage vectors 
such as lambdaZAP (Stratagene) or lambdaZIPOLOX (Gibco BRL) 
can be excised en masse and used to transform E.coli can be 
inserted into suitable vectors and these vectors can the be 
used to transform E. coli. Suitable vectors include pACYC184, 
PUC119, pBR322 (available from New England BioLabs, Bevery, 
MA) . pACYC is preferred. 

Transformed E. coli can be cultured using conventional 
techniques. The culture broth preferably contains antibiotics 
to select and maintain plasmids. Suitable antibiotics include 
penicillin, ampicillin, chloramphenicol , etc. Culturing is 
typically conducted at 20-40°C, preferably at room temperature 
(20-25°C) , for 12 hours to 7 days. 

Cultures are plated and the plates are screened visually 
for colonies with a different color than the colonies of the 
untransformed host E. coli. For example, E. coli transformed 
with the plasmid, pAC-BETA (described below) , produce yellow 
colonies that accumulate 0-carotene. After transformation 
with a cDNA library, colonies which contain a different hue 
than those formed by E. coli /pAC-BETA would be expected to 
contain enzymes which modify the structure or degree of 
expression of 0-carotene. Similar standards can be engineered 
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which overexpress earlier products in carotenoid biosynthesis, 
such as lycopene, 7-carotene, etc. 

Having generally described this invention, a further 
understanding can be obtained by reference to certain specific 
examples which are provided herein for purposes of 
illustration only and are not intended to be limiting unless 
otherwise specified. 

EXAMPLE 

I. Isolation nf fl-caro*^ hydroxylase 
Plasmid construction 

An 8.6kb Bglll fragment containing the carotenoid 
biosynthetic genes of Erwinia herbicola was first cloned in 
the BamHI site of plasmid vector pACYC184 (chloramphenicol 
resistant), and then a l.lkb BamHI . fragment containing the 6- 
carotene hydroxylase (CrtZ) was deleted. The resulting 
plasmid, pAC-BETA, contains all the genes for the formation of 
B-carotene. E.coli strains containing this plasmid accumulate 
B-carotene and form yellow colonies (Cunningham et al., 1994). 

A full length gene encoding IPP isomerase of 
Haematococcus pluvialis (HP04) was first cut out with BamHI- 
Kpnl from pBluescript SK+, and then cloned into a pTrcHisA 
vector with high-level expression from the trc promoter 
(invitrogen Inc.)._A fragment containing the IPP isomerase 
and trc promoter was excised with EcoRV-Kpnl and cloned in 
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Hindlll site of pAC-BETA. E.coli cells transformed with this 
new plasmid pAOBETA-04 form orange (deep yellow) colonies on 
LB plates and accumulate more j3-carotene than cells that 
contain pAC-BETA. 

Screening of the Arabidopsis cDKA Library 

Several X cDNA expression libraries of Arabidopsis were 
obtained from the Arabidopsis Biological Resource Center (Ohio 
State University, Columbus, OH) (Kieber et al., 1993). The X 
cDNA libraries were excised in vivo using Stratagene's 
ExAssist SOLR system to produce a phagemid cDNA library 
wherein each clone also contained an amphicillin. 

E.coli strain DH10BZIP was chosen as the host cells for 
the screening and pigment production. DH10B cells were 
transformed with plasmid pAC-BETA-04 and were plated on LB 
agar plates containing chloramphenicol at 50 fig /ml (from 
United States Biochemical Corporation) . The phagemid 
Arabidopsis cDNA library was then introduced into DH10B cells 
already containing pAC-BETA-04. Transformed cells containing 
both pAC-BETA-04 and Arabidopsis cDNA were selected on 
chloramphenicol plus ampicillin (150 nq/ml) agar plates. 
Maximum color development occurred after 5 days incubation at 
room temperature, and lighter yellow colonies were selected. 
Selected colonies were inoculated into 3 ml liquid LB medium 
containing ampicillin and chloramphenicol, and cultures were 
incubated. Cells were then pelleted and extracted in 80 /xl 
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100% acetone in microfuge tubes. After centrif ligation, 
pigmented supernatant was spotted on silica gel thin-layer 
chromatography (TLC) plates, and developed with a hexane; 
ether (1:1) solvent system. B-carotene hydroxylase clones 
were identified based on the appearance of zeaxanthin on TLC 
plate. 

Subclonina and Sequencing 

The B-carotene hydroxylase cDNA was isolated by standard 
procedures (Sambrook et al., 1989). Restriction maps showed 
that three independent inserts (1.9kb, 0.9kb and 0.8kb) 
existed in the cDNA. To determine which cDNA insert confers 
the B-carotene hydroxylase activity, plasmid DNA was digested 
with NotI (a site in the adaptor of the cDNA library) and 
three inserts were subcloned into NotI site of SK vectors. 
These subclones were used to transform E. coli cells 
containing pAC-BETA-04 again to test the hydroxylase activity. 
A fragment of 0.95kb, later shown to contain the hydroxylase 
gene, was also blunt-ended and cloned into pTrcHis A,B,C 
vectors. To remove the N terminal sequence, a restriction 
site (Bglll) was used that lies just before the conserved 
sequence with bacterial genes. A Bglll-Xhol fragment was 
directionally cloned in BamHI-XhoI digested trc vectors. 
Functional clones were identified by the color complementation 
*test. A /3-carotene hydroxylase enzyme produces a colony with 
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a lighter yellow color than is found in cells containing pAC- 
BETA-04 alone. 

Arabidopsis 6-carotene hydroxylase was sequenced 
completely on both strands on an automatic sequencer (Applied 
Biosystems, Model 373A; Version 2.0. IS). 

Pigment Analysis 

A single colony was used to inoculate 50 ml of LB 
containing ampicillin and chloramphenicol in a 250-ml flask. 
Cultures were incubated at 28 °c for 36 hours with gentle 
shaking, and then harvested at 5000 rpm in an SS-34 rotor. 
The cells were washed once with distilled H 2 0 and resuspended 
with 0.5 ml of water. The extraction procedures and HPLC were 
essentially as described previously (Cunningham et al, 1994). 

II. Isolation of e cyclase 
Plasmid Construction 

Construction of plasmids pAC-LYC, pAC-NEUR, and pAC-ZETA 
is described in Cunningham et al., (1994). In brief, the 
appropriate carotenoid biosynthetic genes from Erwinia 
herbicola, Rhodobacter capsulatus, and Synechococcus sp. 
strain PCC7942 were cloned in the plasmid vector pACYC184 (New 
England BioLabs, Beverly, MA). Cultures of E. coli containing 
the plasmids pAC-ZETA, pAC-NEUR, and pAC-LYC, accumulate f- 
carotene, neur ospor ene , and lycopene, respectively. The 
plasmid pAC-ZETA was constructed as follows: an 8.6-kb Bglll 
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fragraent containing the carotenoid biosynthetic genes of £. 
herbicola (GenBank M87280; Hundle et al., 1991) was obtained 
after partial digestion of plasmid pPL376 (Perry et al., 1986; 
Tuveson et al., 1986) and cloned in the BamHI site of pACYC!84 
to give the plasmid pAC-EHER. Deletion of adjacent 0.8- and 
1.1-kb BamHI-BamHI fragments (deletion 2 in Cunningham et al., 
1994), and of a l.l kB Sall-Sall fragment (deletion X) served 
to remove most of the coding regions for the E. herbicola /?- 
carotene hydroxylase (crt gene) and zeaxanthin 
glucosyltransf erase (crtx gene) , respectively. The resulting 
plasmid, pAC-BETA, retains functional genes for geranylgeranyl 
pyrophosphate synthase (crtE) , phytoene synthase (crtB) , 
phytoene desaturase (crtl) , and lycopene cyclase (crtY) . 
Cells of E* coli containing this plasmid form yellow colonies 
and accumulate 0-carotene. A plasmid containing both the e- 
and ^-cyclase cDNAs of A. thai i ana .was constructed by excising 
the c cyclase in clone y2 as a PvuI-PvuII fragment and 
ligating this piece in the SnaBI site of a plasmid (pSPORT 1 
from GIBCO-BRL) that already contained the /3 cyclase. 



Organisms and Growth Conditions 

E. coli strains TOP10 and TOP10 F' (obtained from 
Invitrogen Corporation, San Diego, CA) and XLl-Blue 
tStratagene) were grown in Luria-Bertani (LB) medium (Sambrook 
et al., 1989) at 37°C in darkness on a platform shaker at 225 
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cycles per min. Media components were from Difco (yeast 
extract and tryptone) or Sigma (NaCl) . Ampicillin at 150 
Mg/mL and/or chloramphenicol at 50 nq/viL (both from United 
States Biochemical Corporation) were used, as appropriate, for 
selection and maintenance of plasmids. 

Mass Excision and Color Complementation Screening of an A. 
frh«i jan a cDNA Library 

A size-fractionated 1-2 kB cDNA library of A. thaliana in 
lambda ZAPII (Kieber et al., 1993) was obtained from the 
Arabidopsis Biological Resource Center at The Ohio State 
University (stock number CD4-14) . Other size fractionated 
libraries were also obtained (stock numbers CD4-13, CD4-15, 
and CD4-16) . An aliquot of each library was treated to cause 
a mass excision of the cDNAs and thereby produce a phagemid 
library according to the instructions provided by the supplier 
of the cloning vector (Stratagene; B. coli strain XLl-Blue and 
the helper phage R408 were used) . The titre of the excised 
phagemid was determined and the library was introduced into a 
lycopene-accumulating strain of E. coli TOP10 F' (this strain 
contained the plasmid pAC-LYC) by incubation of the phagemid 
with the E. coli cells for 15 min at 37°C. Cells had been 
grown overnight at 30 °C in LB medium supplemented with 2% 
(w/v) maltose and 10 mM MgS0 4 (final concentration) f and 
harvested in 1.5 ml^microfuge tubes at a setting of 3 on an 
Eppendorf microfuge (5415C) for 10 min. The pellets were 
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resuspended in 10 inM MgS0 4 to a volume equal to one-half that 
of the initial culture volume. Transformants were spread on 
large (150 mm diameter) LB agar petri plates containing 
antibiotics to provide for selection of cDNA clones 
(ampicillin) and maintenance of pAC-LYC (chloramphenicol) . 
Approximately 10,000 colony forming units were spread on each 
plate. Petri plates were incubated at 37 -C for 16 hr and then 
at room temperature for 2 to 7 days to allow maximum color 
development. Plates were screened visually with the aid of an 
illuminated 3x magnifier and a low power stage-dissecting 
microscope for the rare, pale pinkish-yellow to deep-yellow 
colonies that could be observed in the background of pink 
colonies. A colony color of yellow or pinkish-yellow was 
taken as presumptive evidence of a cyclization activity. 
These yellow colonies were collected with sterile toothpicks 
and used to inoculate 3ml of LB medium in culture tubes with 
overnight growth at 37°C and shaking at 225 cycles/min. 
Cultures were split into two aliquots in microfuge tubes and 
harvested by centrifugation at a setting of 5 in an Eppendorf 
5415C microfuge. After discarding the liquid, one pellet was 
frozen for later purification of plasmid DNA. To the second 
pellet was added 1.5 ml EtOH, and the pellet was resuspended 
by vortex mixing, and extraction was allowed to proceed in the 
dark for 15-30 min with occasional remixing. Insoluble 
-materials were pelleted by centrifugation at maximum speed for 
10 min in a microfuge. Absorption spectra of the supernatant 
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fluids were recorded from 350-550 nm with a Perkin Elmer 
lambda six spectrophotometer. 

Analysis of isolated clones 

Eight of the yellow colonies contained 0-carotene 
indicating that a single gene product catalyzes both 
cyclizations required to form the two {3 endgroups of the 
symmetrical 0-carotene from the symmetrical precursor 
lycopene. One of the yellow colonies contained a pigment with 
the spectrum characteristic of 6-carotene, a monocyclic 
carotenoid with a single e endgroup. Unlike the jS cyclase, 
this e cyclase appears unable to carry out a second 
cyclization at the other end of the molecule. 

The observation that e cyclase is unable to form two 
cyclic e endgroups (e.g. the bicyclic € -carotene) illuminates 
the mechanism by which plants can coordinate and control the 
flow of substrate into carotenoids derived from 0-carotene 
versus those derived from a-carotene and also can prevent the 
formation of carotenoids with two € endgroups. 

The availability of the A. thai xana gene encoding the e 
cyclase enables the directed manipulation of plant and algal 
species for modification of carotenoid content and 
composition. Through inactivation of the c cyclase, whether 
at the gene level by deletion of the gene or by insertional 
inactivation or by reduction of the amount of enzyme formed 
(by such as antisense technology) , one may increase the 
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formation of 0-carotene and other pigments derived from it. 
Since vitamin A is derived only from carotenoids with 0 
endgroups, an enhancement of the production of 0-carotene 
versus a-carotene may enhance nutritional value of crop 
plants. Reduction of carotenoids with e endgroups may also be 
of value in modifying the color properties of crop plants and 
specific tissues of these plants. Alternatively, where 
production of a-carotene, or pigments such as lutein that are 
derived from a-carotene, is desirable, whether for the color 
properties, nutritional value or other reason, one may 
overexpress the e cyclase or express it in specific tissues. 
Wherever agronomic value of a crop is related to pigmentation 
provided by carotenoid pigments the directed manipulation of 
expression of the e cyclase gene and/or production of the 
enzyme may be of commercial value. 

The predicted amino acid sequence of the A. thaliana € 
cyclase enzyme was determined. A comparison of the amino acid 
sequences of the 0 and e cyclase enzymes of AraMdopsis 
thaliana (Fig. 13) as predicted by the DNA sequence of the 
respective genes (Fig. 4 for the e cyclase cDNA sequence) , 
indicates that these two enzymes have many regions of sequence 
similarity, but they are only about 37% identical overall at 
the amino acid level. The degree of sequence identity at the 
DNA base level, only about 50%, is sufficiently low such that 
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we and others have been unable to detect this gene by 
hybridization using the (3 cyclase as a probe in DNA gel blot 
experiments. 
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Having now fully described the invention, it will be 
apparent to one of ordinary skill in the art that many changes 
and modifications can be made thereto without departing from 
the spirit or scope of the invention as set forth herein. 
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(2) INFORMATION FOR SEQ ID NO:l: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1860 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 109.. 1680 

(D) OTHER INFORMATION: /product= "E-CYCLASE FROM A. 
THALIANA" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ACAAAAGGAA ATAATTAGAT TCCTCTTTCT GCTTGCTATA CCTTGATAGA ACAATATAAC 60 

AATGGTGTAA GTCTTCTCGC TGTATTCGAA ATTATTTGGA GGAGGAAA ATG GAG TGT 117 

Met Glu Cys 
1 

GTT GGG GCT AGG AAT TTC GCA GCA ATG GCG GTT TCA ACA TTT CCG TCA 165 
Val Gly Ala Arg Asn Phe Ala Ala Met Ala Val Ser Thr Phe Pro Ser 
5 10 15 

TGG AGT TGT CGA AGG AAA TTT CCA GTG GTT AAG AGA TAC AGC TAT AGG 213 
Trp Ser Cys Arg Arg Lys Phe Pro Val Val Lys Arg Tyr Ser Tyr Arg 
20 25 30 35 



AAT ATT CGT TTC GGT TTG TGT AGT GTC AGA GCT AGC GGC GGC GGA AGT 
Asn lie Arg Phe Gly Leu Cys Ser Val Arg Ala Ser Gly Gly Gly Ser 
40 45 50 



TTA AAA GTT GGA CTC ATT GGT CCA GAT CTT CCT TTT ACT AAC AAT TAC 
Leu Lys Val Gly Leu He Gly Pro Asp Leu Pro Phe Thr Asn Asn Tyr 
135 140 145 



261 



TCC GGT AGT GAG AGT TGT GTA GCG GTG AGA GAA GAT TTC GCT GAC GAA 309 
Ser Gly Ser Glu Ser Cys Val Ala Val Arg Glu Asp Phe Ala Asp Glu 
55 60 65 

GAA GAT TTT GTG AAA GCT GGT GGT TCT GAG ATT CTA TTT GTT CAA ATG 357 
Glu Asp Phe Val Lys Ala Gly Gly Ser Glu He Leu Phe Val Gin Met 
70 75 80 

CAG CAG AAC AAA GAT ATG GAT GAA CAG TCT AAG CTT GTT GAT AAG TTG 405 
Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val Asp Lys Leu 
85 90 95 

CCT CCT ATA TCA ATT GGT GAT GGT GCT TTG GAT CAT GTG GTT ATT GGT 453 
Pro Pro He Ser He Gly Asp Gly Ala Leu Asp His Val Val He Gly 
100 105 no us 

TGT GGT CCT GCT GGT TTA GCC TTG GCT GCA GAA TCA GCT AAG CTT GGA 501 
Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala. Glu Ser Ala Lys Leu Gly 
120 125 130 



549 



GGT GTT TGG GAA GAT GAA TTC AAT GAT CTT GGG CTG CAA AAA TGT ATT 597 
Gly Val Trp Glu Asp Glu Phe Asn Asp Leu Gly Leu Gin Lys Cys He 
150 155 160 

GAG CAT GTT TGG AGA GAG ACT ATT GTG TAT CTG GAT GAT GAC AAG CCT 645 
Glu His Val Trp Arg Glu Thr He Val Tyr Leu Asp Asp Asp Lys Pro 
165 170 175 

ATT ACC ATT GGC CGT GCT TAT GGA AGA GTT AGT CGA CGT TTG CTC CAT 693 
lie Thr He Gly Arg Ala Tyr Gly Arg Val Ser Arg Arg Leu Leu His 
180 185 190 195 



SUBSTITUTE SHEET (RULE 26) 



! 



WO 97/36998 PCT/US97/00540 

34 

GAG GAG CTT TTG AGG AGG TGT GTC GAG TCA GGT GTC TCG TAC CTT AGC 741 
Glu Glu Leu Leu Arg Arg Cys Val Glu Ser Gly Val Ser Tyr Leu Ser 
200 205 210 

TCG AAA GTT GAC AGC ATA ACA GAA GCT TCT GAT GGC CTT AGA CTT GTT 789 
Ser Lys Val Asp Ser He Thr Glu Ala Ser Asp Gly Leu Arg Leu Val 
215 220 225 

GCT TGT GAC GAC AAT AAC GTC ATT CCC TGC AGG CTT GCC ACT GTT GCT 837 
Ala Cys Asp Asp Asn Asn Val He Pro Cys Arg Leu Ala Thr Val Ala 
230 235 240 

TCT GGA GCA GCT TCG GGA AAG CTC TTG CAA TAC GAA GTT GGT GGA CCT 885 
Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val Gly Gly Pro 
245 250 255 

AGA GTC TGT GTG CAA ACT GCA TAC GGC GTG GAG GTT GAG GTG GAA AAT 933 
Arg Val Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu Val Glu Asn 
260 265 270 275 

AGT CCA TAT GAT CCA GAT CAA ATG GTT TTC ATG GAT TAC AGA GAT TAT 981 
Ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr Arg Asp Tyr 
280 285 290 

ACT AAC GAG AAA GTT CGG AGC TTA GAA GCT GAG TAT CCA ACG TTT CTG 102 9 

Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro Thr Phe Leu 
295 300 305 

TAC GCC ATG CCT ATG ACA AAG TCA AGA CTC TTC TTC GAG GAG ACA TGT 1077 
Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu Glu Thr Cys 
310 315 320 

TTG GCC TCA AAA GAT GTC ATG CCC TTT GAT TTG CTA AAA ACG AAG CTC 1125 
Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys Thr Lys Leu 
325 330 335 

ATG TTA AGA TTA GAT ACA CTC GGA ATT CGA ATT CTA AAG ACT TAC GAA 1173 
Met Leu Arg Leu Asp Thr Leu Gly lie Arg He Leu Lys Thr Tyr Glu 
340 345 350 355 

GAG GAG TGG TCC TAT ATC CCA GTT GGT GGT TCC TTG CCA AAC ACC GAA 1221 
Glu Glu Trp Ser Tyr He Pro Val Gly Gly Ser Leu Pro Asn Thr Glu 
360 365 370 

CAA AAG AAT CTC GCC TTT GGT GCT GCC GCT AGC ATG GTA CAT CCC GCA 1269 
Gin Lys Asn Leu Ala Phe Gly Ala Ala Ala Ser Met Val His Pro Ala 
375 380 385 

ACA GGC TAT TCA GTT GTG AGA TCT TTG TCT GAA GCT CCA AAA TAT GCA 1317 
Thr Gly Tyr Ser Val Val Arg Ser Leu Ser Glu Ala Pro Lys Tyr Ala 
390 395 400 

TCA GTC ATC GCA GAG ATA CTA AGA GAA GAG ACT ACC AAA CAG ATC AAC 1365 
Ser Val He Ala Glu He Leu Arg Glu Glu Thr Thr Lys Gin He Asn 
405 410 415 

AGT AAT ATT TCA AGA CAA GCT TGG GAT ACT TTA TGG CCA CCA GAA AGG 1413 
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Ser Asn He Ser Arg Gin Ala Trp Asp Thr Leu Tip Pro Pro Glu Arg . 
420 425 430 435 

AAA AGA CAG AGA GCA TTC TTT CTC TTT GGT CTT GCA CTC ATA GTT CAA 1461 
Lys Arg Gin Arg Ala Phe Phe Leu Phe Gly Leu Ala Leu He Val Gin 
440 445 450 

TTC GAT ACC GAA GGC ATT AGA AGC TTC TTC CGT ACT TTC TTC CGC CTT 1509 
Phe Asp Thr Glu Gly He Arg Ser Phe Phe Arg Thr Phe Phe Arg Leu 
455 460 465 

CCA AAA TGG ATG TGG CAA GGG TTT CTA GGA TCA ACA TTA ACA TCA GGA 1557 
Pro Lys Trp Met Trp Gin Gly Phe Leu Gly Ser Thr Leu Thr Ser Gly 
470 475 480 

GAT CTC GTT CTC TTT GCT TTA TAC ATG TTC GTC ATT TCA CCA AAC AAT 1605 
Asp Leu Val Leu Phe Ala Leu Tyr Met Phe Val He Ser Pro Asn Asn 
485 490 495 

TTG AGA AAA GGT CTC ATC AAT CAT CTC ATC TCT GAT CCA ACC GGA GCA 1653 
Leu Arg Lys Gly Leu He Asn His Leu He Ser Asp Pro Thr Gly Ala 
500 505 510 - 515 

ACC ATG ATA AAA ACC TAT CTC AAA GTA TGATTTACTT ATCAACTCTT 1700 
Thr Met He Lys Thr Tyr Leu Lys Val 
520 

AGGTTTGTGT ATATATATGT TGATTTATCT GAATAATCGA TCAAAGAATG GTATGTGGGT 1760 

TACTAGGAAG TTGGAAACAA ACATGTATAG AATCTAAGGA GTGATCGAAA TGGAGATGGA 1820 

AACGAAAAGA AAAAAATCAG TCTTTGTTTT GTGGTTAGTG 1860 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 524 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Glu Cys Val Gly Ala Arg Asn Phe Ala Ala Met Ala Val Ser Thr 
15 10 15 

Phe Pro Ser Trp Ser Cys Arg Arg Lys Phe Pro Val Val Lys Arg Tyr 
20 25 30 

Ser Tyr Arg Asn He Arg Phe Gly Leu Cys Ser Val Arg Ala Ser Gly 
35 40 45 

Gly Gly Ser Ser Gly Ser Glu Ser Cys Val Ala Val Arg Glu Asp Phe 
50 55 60 
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Ala Asp Glu Glu Asp Phe Val Lys Ala Gly Gly Ser Glu lie Leu Phe 
65 70 75 80 

Val Gin Met Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val 
85 90 95 

Asp Lys Leu Pro Pro lie Ser lie Gly Asp Gly Ala Leu Asp His Val 
100 105 110 

Val lie Gly Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala 
115 120 125 

Lys Leu Gly Leu Lys Val Gly Leu lie Gly Pro Asp Leu Pro Phe Thr 
130 135 140 

Asn Asn Tyr Gly Val Trp Glu Asp Glu Phe Asn Asp Leu Gly Leu Gin 
145 150 155 160 

Lys Cys lie Glu His Val Trp Arg Glu Thr lie Val Tyr Leu Asp Asp 
165 170 175 

Asp Lys Pro lie Thr lie Gly Arg Ala Tyr Gly Arg Val Ser Arg Arg 
180 185 190 

Leu Leu His Glu Glu Leu Leu Arg Arg Cys Val Glu Ser Gly Val Ser 
195 200 205 

Tyr Leu Ser Ser Lys Val Asp Ser lie Thr Glu Ala Ser Asp Gly Leu 
210 215 220 

Arg Leu Val Ala Cys Asp Asp Asn Asn Val lie Pro Cys Arg Leu Ala 
225 230 235 240 

Thr Val Ala Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val 
245 250 255 

Gly Gly Pro Arg Val Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu 
260 265 270 

Val Glu Asn Ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr 
275 280 285 

Arg Asp Tyr Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro 
290 295 300 

Thr Phe Leu Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu 
305 310 315 320 

Glu Thr Cys Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys 
325 330 335 

Thr Lys Leu Met Leu Arg Leu Asp Thr Leu Gly lie Arg lie Leu Lys 
340 345 350 

Thr Tyr Glu Glu Glu Trp Ser Tyr lie Pro Val Gly Gly Ser Leu Pro 
355 360 365 
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Asn Thr Glu Gin Lys Asn Leu Ala Phe Gly Ala Ala Ala Ser Met Val 
3 ™ 375 380 

His Pro Ala Thr Gly Tyr Ser Val Val Arg Ser Leu Ser Glu Ala Pro 
385 390 395 400 

Lys Tyr Ala Ser Val lie Ala Glu He Leu Arg Glu Glu Thr Thr Lys 
405 410 4is 

Gin He Asn Ser Asn He Ser Arg Gin Ala Trp Asp Thr Leu Trp Pro 
420 425 430 

Pro Glu Arg Lys Arg Gin Arg Ala Phe Phe Leu Phe Gly Leu Ala Leu 
435 440 445 

He Val Gin Phe Asp Thr Glu Gly He Arg Ser Phe Phe Arg Thr Phe 
450 455 4 6 o 

Phe Arg Leu Pro Lys Trp Met Trp Gin Gly Phe Leu Gly Ser Thr Leu 
465 470 475 480 

Thr Ser Gly Asp Leu Val Leu Phe Ala Leu Tyr Met Phe Val He Ser 
4 85 490 495 

Pro Asn Asn Leu Arg Lys Gly Leu He Asn His Leu He Ser Asp Pro 
500 505 510 

Thr Gly Ala Thr Met He Lys Thr Tyr Leu Lys Val 
515 520 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 956 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
GCTCTTTCTC CTCCTCCTCT ACCGATTTCC GACTCCGCCT CCCGAAATCC TTATCCGGAT 
TCTCTCCGTC TCTTCGATTT AAACGCTTTT CTGTCTGTTA CGTCGTCGAA GAACGGAGAC 
AGAATTCTCC GATTGAGAAC GATGAGAGAC CGGAGAGCAC GAGCTCCACA AACGCTATAG 
ACGCTGAGTA TCTGGCGTTG CGTTTGGCGG AGAAATTGGA GAGGAAGAAA TCGGAGAGGT 
CCACTTATCT AATCGCTGCT ATGTTGTCGA GCTTTGGTAT CACTTCTATG GCTGTTATGG 
CTGTTTACTA CAGATTCTCT TGGCAAATGG AGGGAGGTGA GATCTCAATG TTGGAAATGT 
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TTGGTACATT TGCTCTCTCT GTTGGTGCTG CTGTTGGTAT GGAATTCTGG GCAAGATGGG 420 

CTCATAGAGC TCTGTGGCAC GCTTCTCTAT GGAATATGCA TGAGTCACAT CACAAACCAA 480 

GAGAAGGACC GTTTGAGCTA AACGATGTTT TTGCTATAGT GAACGCTGGT CCAGCGATTG 540 

GTCTCCTCTC TTATGGATTC TTCAATAAAG GACTCGTTCC TGGTCTCTGC TTTGGCGCCG 600 

GGTTAGGCAT AACGGTGTTT GGAATCGCCT ACATGTTTGT CCACGATGGT CTCGTGCACA 660 

AGCGTTTCCC TGTAGGTCCC ATCGCCGACG TCCCTTACCT CCGAAAGGTC GCCGCCGCTC 720 

ACCAGCTACA TCACACAGAC AAGTTCAATG GTGTACCATA TGGACTGTTT CTTGGACCCA 780 

AGGAATTGGA AGAAGTTGGA GGAAATGAAG AGTTAGATAA GGAGATTAGT CGGAGAATCA 840 

AATCATACAA AAAGGCCTCG GGCTCCGGGT CGAGTTCGAG TTCTTGACTT TAAACAAGTT 900 

TTAAATCCCA AATTCTTTTT TTGTCTTCTG TCATTATGAT CATCTTAAGA CGGTCT 956 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 294 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOIiOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Ser Phe Ser Ser Ser Ser Thr Asp Phe Arg Leu Arg Leu Pro Lys Ser 
15 10 15 

Leu Ser Gly Phe Ser Pro Ser Leu Arg Phe Lys Arg Phe Ser Val Cys 
20 25 30 

Tyr Val Val Glu Glu Arg Arg Gin Asn Ser Pro He Glu Asn Asp Glu 
35 40 45 

Arg Pro Glu Ser Thr Ser Ser Thr Asn Ala He Asp Ala Glu Tyr Leu 
50 55 60 

Ala Leu Arg Leu Ala Glu Lys Leu Glu Arg Lys Lys Ser Glu Arg Ser 
65 70 75 80 

Thr Tyr Leu lie Ala Ala Met Leu Ser Ser Phe Gly He Thr Ser Met 
85 90 .95 

Ala Val Met Ala Val Tyr Tyr Arg Phe Ser Trp Gin Met Glu Gly Gly 
100 105 HO 

Glu He Ser Met Leu Glu Met Phe Gly Thr Phe Ala Leu Ser Val Gly 



SUBSTITUTE SHEET {RULE 26) 



WO 97/36998 



39 



PCTAJS97/00540 



115 120 125 

Ala Ala Val Gly Met Glu Phe Trp Ala Arg Trp Ala His Arg Ala Leu 
130 135 140 

Trp His Ala Ser Leu Trp Met Asn His Glu Ser His His Lys Pro Arg 
145 150 155 160 

Glu Gly Pro Phe Glu Leu Asn Asp Val Phe Ala lie Val Asn Ala Gly 
165 170 175 

Pro Ala lie Gly Leu Leu Ser Tyr Gly Phe Phe Asn Lys Gly Leu Val 
18° 185 190 

Pro Gly Leu Cys Phe Gly Ala Gly Leu Gly He Thr Val Phe Gly lie 
195 200 205 

Ala Tyr Met Phe Val His Asp Gly Leu Val His Lys Arg Phe Pro Val 
210 215 220 

Gly Pro He Ala Asp Val Pro Tyr Leu Arg Lys Val Ala Ala Ala His 
225 230 235 240 

Gin Leu His His Thr Asp Lys Phe Asn Gly Val Pro Tyr Gly Leu Phe 
245 250 255 

Leu Gly Pro Lys Glu Leu Glu Glu Val Gly Gly Asn Glu Glu Leu Asp 
260 265 270 

Lys Glu He Ser Arg Arg He Lys Ser Tyr Lys Lys Ala Ser Gly Ser 
275 280 285 

Gly Ser Ser Ser Ser Ser 
290 

INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Thr Gin Phe Leu He Val Val Ala Thr Val Leu Val Met Glu Leu 
1 5 io is 

Thr Ala Tyr Ser Val His Arg Trp lie Met His Gly Pro Leu Gly Trp 
20 25 30 

Gly Trp His Lys Ser His His Glu Glu His Asp His Ala Leu Glu Lys 
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Asn Asp Leu Tyr Gly 
50 

Thr Val Gly Ala Tyr 
65 

Met Thr Val Tyr Gly 
85 

His Gin Arg Tip Pro 
100 

Arg Leu Tyr Gin Ala 
115 

His Cys Val Ser Phe 
130 

Lys Gin Asp Leu Lys 
145 



40 

Val Val Phe Ala Val Leu 
55 

Trp Trp Pro Val Leu Trp 
70 75 

Leu lie Tyr Phe He Leu 
90 

Phe Arg Tyr He Pro Arg 
105 

His Arg Leu His His Ala 
120 

Gly Phe He Tyr Ala Pro 
135 

Arg Ser Gly Val Leu Arg 
150 155 



45 

Ala Thr He Leu Phe 
60 

Trp He Ala Leu Gly 
80 

His Asp Gly Leu Val 
95 

Arg Gly Tyr Phe Arg 
110 

Val Glu Gly Arg Asp 
125 

Pro Val Asp Lys Leu 
140 

Pro Gin Asp Glu Arg 
16 0 



Pro Ser 



) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Leu Asn Ser Leu He Val He Leu Ser Val He Ala Met Glu Gly 
15 10 15 

He Ala Ala Phe Thr His Arg Tyr He Met His Gly Trp Gly Trp Arg 
20 25 30 

Trp His Glu Ser His His Thr Pro Arg Lys Gly Val Phe Glu Leu Asn 
35 40 45 

Asp Leu Phe Ala Val Val Phe Ala Gly Val Ala He Ala Leu He Ala 
50 55 60 

Val Gly Thr Ala Gly Val Trp Pro Leu Gin Trp He Gly Cys Gly Met 
65 70 75 80 

Thr Val Tyr Gly Leu Leu Tyr Phe Leu Val His Asp Gly . Leu Val His 



SUBSTITUTE SHEET (RULE 26) 



WO 97/36998 



41 



PCT/US97/00540 



85 90 95 

Gin Arg Trp Pro Phe His Trp He Pro Arg Arg Gly Tyr Leu Lys Arg 
100 los no 

Leu Tyr Val Ala His Arg Leu His His Ala Val Arg Gly Arg Glu Gly 
115 120 125 

Cys Val Ser Phe Gly Phe He Tyr Ala Arg Lys Pro Ala Asp Leu Gin 
130 135 140 

Ala He Leu Arg Glu Arg His Gly Arg Pro Pro Lys Arg Asp Ala Ala 
145 150 155 160 

Lys Asp Arg Pro Asp Ala Ala Ser Pro Ser Ser Ser Ser Pro Glu 
165 170 175 

2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 
<B) TYPE: amino acid 
{ C ) STRANDEDNESS : s ingl e 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Leu Trp He Trp Asn Ala Leu He Val Phe Val Thr Val He Gly 
1 5 10 15 

Met Glu Val He Ala Ala Leu Ala His Lys Tyr lie Met His Gly Trp 
20 25 30 

Gly Trp Gly Trp His Leu Ser His His Glu Pro Arg Lys Gly Ala Phe 
35 40 45 

Glu Val Asn Asp Leu Tyr Ala Val Val Phe Ala Ala Leu Ser He Leu 
50 55 60 

Leu He Tyr Leu Gly Ser Thr Gly Met Trp Pro Leu Gin Trp He Gly 
65 70 75 80 

Ala Gly Met Thr Ala Tyr Gly Leu Leu Tyr Phe Met Val His Asp Gly 
85 90 95 

Leu Val His Gin Arg Trp Pro Phe Arg Tyr lie Pro Arg Lys Gly Tyr 
100 105 no 

Leu Lys Arg Leu Tyr Met Ala His Arg Met His His Ala Val Arg Gly 
115 120 125 

Lys Glu Gly Cys Val Ser Phe Glv Phe Leu Tvr Ala Pro Pro Leu Ser 
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130 135 140 

Lys Leu Gin Ala Thr Leu Arg Glu Arg His Gly Ala Arg Ala Gly Ala 
145 150 155 160 

Ala Arg Asp Ala Gin Gly Gly Glu Asp Glu Pro Ala Ser Gly Lys 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

Met Thr Asn Phe Leu He Val Val Ala Thr Val Leu Val Met Glu Leu 
1 5 10 15 

Thr Ala Tyr Ser Val His Arg Trp He Met His Gly Pro Leu Gly Trp 
20 25 30 

Gly Trp His Lys Ser His His Glu Glu His Asp His Ala Leu Glu Lys 
35 40 45 

Asn Asp Leu Tyr Gly Leu Val Phe Ala Val He Ala Thr Val Leu Phe 
50 55 60 

Thr Val Gly Trp He Trp Ala Pro Val Leu Trp Trp He Ala Leu Gly 
65 70 75 80 

Met Thr Val Tyr Gly Leu He Tyr Phe Val Leu His Asp Gly Leu Val 
85 90 95 

His Trp Arg Trp Pro Phe Arg Tyr He Pro Arg Lys Gly Tyr Ala Arg 
100 105 HO 

Arg Leu Tyr Gin Ala His Arg Leu His His Ala Val Glu Gly Arg Asp 
115 120 125 

His Cys Val Ser Phe Gly Phe He Tyr Ala Pro Pro Val Asp Lys Leu 
130 135 140 

Lys Gin Asp Leu Lys Met Ser Gly Val Leu Arg Ala Glu Ala Gin Glu 
145 150 155 160 

Arg Thr 



(2) INFORMATION FOR SEQ ID NO : 9 : 
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<i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 954 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



CCACGGGTCC 


GCCTCCCCGT TTTTTTCCGA TCCRATfTPP nrzTrrr<r>7ir>r> 


ACTCAGCTGT 


60 


TTGTTCGCGC 


TTTCTCAGCC GTCACCATGA CCG ATTCTa a rn&TrrTpr<A 


ATGGATGCTG 


120 


TTCAGAGACG 


ACTCATGTTT GAAGACGAAT GCATTCTCGT TGATGAAAAT 


AATCGTGTGG 


180 


TGGGACATGA 


CACTAAGTAT AACTGTCATC TGATGGAAAA GATTGAAGCT 


GAGAATTTAC 


240 


TTCACAGAGC 


TTTCAGTGTG TTTTTATTCA ACT CCAAG TA TGAGTTGCTT 


CTCCAGCAAC 


300 


GGTCAAAAAC 


AAAGGTTACT TTCCCACTTG TGTGGACAAA CACTTGTTGC 


AGCCATCCTC 


360 


TTTACCGTGA 


ATCCGAGCTT ATTGAAGAGA ATGTGCTTGG TGTAAGAAAT 


GCCGCACAAA 


420 


GGAAGCTTTT 


CGATGAGCTC GGTATTGTAG CAGAAGATGT ACCAGTCGAT 


GAGTTCACTC 


480 


CCTTGGGACG 


CATGCTTTAC AAGGCACCTT CTGATGGGAA ATGGGGAGAG 


CACGAAGTTG 


540 


ACTATCTACT 


CTTCATCGTG CGGGATGTGA AGCTTCAACC AAACCCAGAT 


GAAGTGGCTG 


600 


AGATCAAGTA 


CGTGAGCAGG GAAGAGCTTA AGGAGCTGGT GAAGAAAGCA 


GATGCTGGCG 


660 


ATGAAGCTGT 


GAAACTATCT CCATGGTTCA GATTGGTGGT GGATAATTTC 


TTGATGAAGT 


720 


GGTGGGATCA TGTTGAGAAA GGAACTATCA CTGAAGCTGC AGACATGAAA ACCATTCACA 


780 


AGCTCTGAAC 


TTTCCATAAG TTTTGGATCT TCCCCTTCCC ATAATAAAAT 


TAAGAGATGA 


840 


GACTTTTATT 


GATTACAGAC AAAACTGGCA ACAAAATCTA TTCCTAGGAT 


TTTTTTTTGC 


900 


TTTTTATTTA 


CTTTTGATTC ATCTCTAGTT TAGTTTTCAT CTTAAAAAAA 


AAAA 


954 


(2) INFORMATION FOR SEQ ID NO: 10: 







(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 996 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CACCAATGTC TGTTTCTTCT TTATTTAATC TCCCATTGAT TCGCCTCAGA TCTCTCGCTC 60 

TTTCGTCTTC TTTTTCTTCT TTCCGATTTG CCCATCGTCC TCTGTCATCG ATTTCACCGA 120 

GAAAGTTACC GAATTTTCGT GCTTTCTCTG GTACCGCTAT GACAGATACT AAAGATGCTG 180 

GTATGGATGC TGTTCAGAGA CGTCTCATGT TTGAGGATGA ATGCATTCTT GTTGATGAAA 240 

CTGATCGTGT TGTGGGGCAT GTCAGCAAGT ATAATTGTCA TCTGATGGAA AATATTGAAG 300 

CCAAGAATTT GCTGCACAGG GCTTTTAGTG TATTTTTATT CAACTCGAAG TATGAGTTGC 360 

TTCTCCAGCA AAGGTCAAAC ACAAAGGTTA CGTTCCCTCT AGTGTGGACT AACACTTGTT 420 

GCAGCCATCC TCTTTACCGT GAATCAGAGC TTATCCAGGA CAATGCACTA GGTGTGAGGA 480 

ATGCTGCACA AAGAAAGCTT CTCGATGAGC TTGGTATTGT AGCTGAAGAT GTACCAGTCG 54 0 

ATGAGTTCAC TCCCTTGGGA CGTATGCTGT ACAAGGCTCC TTCTGATGGC AAATGGGGAG 600 

AGCATGAACT TGATTACTTG CTCTTCATCG TGCGAGACGT GAAGGTTCAA CCAAACCCAG 660 

ATGAAGTAGC TGAGATCAAG TATGTGAGCC GGGAAGAGCT GAAGGAGCTG GTGAAGAAAG 720 

CAGATGCAGG TGAGGAAGGT TTGAAACTGT CACCATGGTT CAGATTGGTG GTGGACAATT 780 

TCTTGATGAA GTGGTGGGAT CATGTTGAGA AAGGAACTTT GGTTGAAGCT ATAGACATGA 840 

AAACCATCCA CAAACTCTGA ACATCTTTTT TTAAAGTTTT TAAATCAATC AACTTTCTCT 900 

TCATCATTTT TATCTTTTCG ATGATAATAA TTTGGGATAT GTGAGACACT TACAAAACTT 960 

CCAAGCACCT CAGGCAATAA TAAAGTTTGC GGCCGC 996 
<2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCGGTAGCT GGCCACAATC GCTATTTGGA ACCTGGCCCG GCGGCAGTCC GATGCCGCGA 
TGCTTCGTTC GTTGCTCAGA GGCCTCACGC ATATCCCCCG CGTGAACTCC GCCCAGCAGC 
CCAGCTGTGC ACACGCGCGA CTCCAGTTTA AGCTCAGGAG CATGCAGATG ACGCTCATGC 
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AGCCCAGCAT CTCAGCCAAT CTGTCGCGCG CCGAGGACCG CACAGACCAC ATGAGGGGTG 240 
CAAGCACCTG GGCAGGCGGG CAGTCGCAGG ATGAGCTGAT GCTGAAGGAC GAGTGCATCT 300 
TGGTGGATGT TGAGGACAAC ATCACAGGCC ATGCCAGCAA GCTGGAGTGT CACAAGTTCC 360 
TACCACATCA GCCTGCAGGC CTGCTGCACC GGGCCTTCTC TGTGTTCCTG TTTGACGATC 420 
AGGGGCGACT GCTGCTGCAA CAGCGTGCAC GCTCAAAAAT CACCTTCCCA AGTGTGTGGA 480 
CGAACACCTG CTGCAGCCAC CCTTTACATG GGCAGACCCC AGATGAGGTG GACCAACTAA 540 
GCCAGGTGGC CGACGGAACA GTACCTGGCG CAAAGGCTGC TGCCATCCGC AAGTTGGAGC 600 
ACGAGCTGGG GATACCAGCG CACCAGCTGC CGGCAAGCGC GTTTCGCTTC CTCACGCGTT 660 
TGCACTACTG TGCCGCGGAC GTGCAGCCAG CTGCGACACA ATCAGCGCTC TGGGGCGAGC 720 
ACGAAATGGA CTACATCTTG TTCATCCGGG CCAACGTCAC CTTGGCGCCC AACCCTGACG 780 
AGGTGGACGA AGTCAGGTAC GTGACGCAAG AGGAGCTGCG GCAGATGATG CAGCCGGACA 84 0 

ACGGGCTGCA ATGGTCGCCG TGGTTTCGCA TCATCGCCGC GCGCTTCCTT GAGCGTTGGT 900 

GGGCTGACCT GGACGCGGCC CTAAACACTG ACAAACACGA GGATTGGGGA ACGGTGCATC 960 

ACATCAACGA AGCGTGAAAG CAGAAGCTGC AGGATGTGAA GACACGTCAT GGGGTGGAAT 1020 

TGCGTACTTG GCAGCTTCGT ATCTCCTTTT TCTGAGACTG AACCTGCAGT CAGGTCCCAC 1080 

AAGGTCAGGT AAAATGGCTC GATAAAATGT ACCGTCACTT TTTGTCGCGT ATACTGAACT 1140 

CCAAGAGGTC AAAAAAAAAA AAAAA 1165 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1135 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CTCGGTAGCT GGCCACAATC GCTATTTGGA ACCTGGCCCG GCGGCAGTCC GATGCCGCGA 
TGCTTCGTTC GTTGCTCAGA GGCCTCACGC ATATCCCGCG CGTGAACTCC GCCCAGCAGC 
CCAGCTGTGC ACACGCGCGA CTCCAGTTTA AGCTCAGGAG CATGCAGCTG CTTTCCGAGG 
ACCGCACAGA CCACATGAGG GGTGCAAGCA CCTGGGCAGG CGGGCAGTCG CAGGATGAGC 
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TGATGCTGAA GGACGAGTGC ATCTTGGTAG ATGTTGAGGA CAACATCACA GGCCATGCCA 300 

GCAAGCTGGA GTGTCACAAG TTCCTACCAC ATCAGCCTGC AGGCCTGCTG CACCGGGCCT 360 

TCTCTGTGTT CCTGTTTGAC GATCAGGGGC GACTGCTGCT GCAACAGCGT GCACGCTCAA 420 

AAATCACCTT CCCAAGTGTG TGGACGAACA CCTGCTGCAG CCACCCTTTA CATGGGCAGA 480 

CCCCAGATGA GGTGGACCAA CTAAGCCAGG TGGCCGACGG AACAGTACCT GGCGCAAAGG 540 

CTGCTGCCAT CCGCAAGTTG GAGCACGAGC TGGGGATACC AGCGCACCAG CTGCCGGCAA 600 

GCGCGTTTCG CTTCCTCACG CGTTTGCACT ACTGTGCCGC GGACGTGCAG CCAGCTGCGA 660 

CACAATCAGC GCTCTGGGGC GAGCACGAAA TGGACTACAT CTTGTTCATC CGGGCCAACG 720 

TCACCTTGGC GCCCAACCCT GACGAGGTGG ACGAAGTCAG GTACGTGACG CAAGAGGAGC 780 

TGCGGCAGAT GATGCAGCCG GACAACGGGC TTCAATGGTC GCCGTGGTTT CGCATCATCG 840 

CCGCGCGCTT CCTTGAGCGT TGGTGGGCTG ACCTGGACGC GGCCCTAAAC ACTGACAAAC 900 

ACGAGGATTG GGGAACGGTG CATCACATCA ACGAAGCGTG AAGGCAGAAG CTGCAGGATG 960 

TGAAGACACG TCATGGGGTG GAATTGCGTA CTTGGCAGCT TCGTATCTCC TTTTTCTGAG 1020 

ACTGAACCTG CAGAGCTAGA GTCAATGGTG CATCATATTC ATCGTCTCTC TTTTGTTTTA 1080 

GACTAATCTG TAGCTAGAGT CACTGATGAA TCCTTTACAA CTTTCAAAAA AAAAA 1135 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 960 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CCAAAAACAA CTCAAATCTC CTCCGTCGCT CTTACTCCGC CATGGGTGAC GACTCCGGCA 60 

TGGATGCTGT TCAGCGACGT CTCATGTTTG ACGATGAATG CATTTTGGTG GATGAGTGTG 120 

ACAATGTGGT GGGACATGAT ACCAAATACA ATTGTCACTT GATGGAGAAG ATTGAAACAG 180 

GTAAAATGCT GCACAGAGCA TTCAGCGTTT TTCTATTCAA TTCAAAATAC GAGTTACTTC 240 

TTCAGCAACG GTCTGCAACC AAGGTGACAT TTCCTTTAGT ATGGACCAAC ACCTGTTGCA 300 

GCCATCCACT CTACAGAGAA TCCGAGCTTG TTCCCGAAAC GCCTGAGAGA ATGCTGCACA 360 
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GAGGANNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 420 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 480 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 540 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 600 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 660 

NNNNNNNNNN NNNNNNNNNN TCATGTGCAA AAGGGTACAC TCACTGAATG CAATTTGATA 720 

TGAAAACCAT ACACAAGCTG ATATAGAAAC ACACCCTCAA CCGAAAAGCA AGCCTAATAA 780 

TTCGGGTTGG GTCGGGTCTA CCATCAATTG TTTTTTTCTT TTAACAACTT TTAATCTCTA 84 0 

TTTGAGCATG TTGATTCTTG TCTTTTGTGT GTAAGATTTT GGGTTTCGTT TCAGTTGTAA 900 

TAATGAACCA TTGATGGTTT GCAATTTCAA GTTCCTATCG ACATGTAGTG ATCTAAAAAA 960 

(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 305 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID. NO: 14: 

Met Leu Arg Ser Leu Leu Arg Gly Leu Thr His lie Pro Arg Val Asn 
15 10 15 

Ser Ala Gin Gin Pro Ser Cys Ala His Ala Arg Leu Gin Phe Lys Leu 
20 25 30 

Arg Ser Met Gin Met Thr Leu Met Gin Pro Ser lie Ser Ala Asn Leu 
35 40 45 

Ser Arg Ala Glu Asp Arg Thr Asp His Met Arg Gly Ala Ser Thr Trp 
50 55 60. 

Ala Gly Gly Gin Ser Gin Asp Glu Leu Met Leu Lys Asp Glu Cys lie 
65 70 75 80 

Leu Val Asp Val Glu Asp Asn lie Thr Gly His Ala Ser Lys Leu Glu 
85 90 95 

Cys His Lys Phe Leu Pro His Gin Pro Ala Gly Leu Leu His Arg Ala 
100 105 no 
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Phe Ser Val Phe Leu Phe Asp Asp Gin Gly Arg Leu Leu Leu Gin Gin 
115 120 125 

Arg Ala Arg Ser Lys He Thr Phe Pro Ser Val Trp Thr Asn Thr Cys 
130 135 140 

Cys Ser His Pro Leu His Gly Gin Thr Pro Asp Glu Val Asp Gin Leu 
145 150 155 160 

Ser Gin Val Ala Asp Gly Thr Val Pro Gly Ala Lys Ala Ala Ala He 
165 170 175 

Arg Lys Leu Glu His Glu Leu Gly He Pro Ala His Gin Leu Pro Ala 
180 185 190 

Ser Ala Phe Arg Phe Leu Thr Arg Leu His Tyr Cys Ala Ala Asp Val 
195 200 205 

Gin Pro Ala Ala Thr Gin Ser Ala Leu Trp Gly Glu His Glu Met Asp 
210 215 220 

Tyr He Leu Phe He Arg Ala Asn Val Thr Leu Ala Pro Asn Pro Asp 
225 230 235 240 

Glu Val Asp Glu Val Arg Tyr Val Thr Gin Glu Glu Leu Arg Gin Met 
245 250 255 

Met Gin Pro Asp Asn Gly Leu Gin Trp Ser Pro Trp Phe Arg lie lie 
260 265 270 

Ala Ala Arg Phe Leu Glu Arg Trp Trp Ala Asp Leu Asp Ala Ala Leu 
275 280 285 

Asn Thr Asp Lys His Glu Asp Trp Gly Thr Val His His He Asn Glu 
290 295 300 

Ala 
305 

) INFORMATION FOR SEQ ID NO: 15: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 293 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

Met Leu Arg Ser Leu Leu Arg Gly Leu Thr His He Pro Arg Val Asn 
15 10 15 
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Ser Ala Gin Gin Pro Ser Cys Ala His Ala Arg Leu Gin Phe Lys Leu 
20 25 30 

Arg ser Met Gin Leu Leu Ser Glu Asp Arg Thr Asp His Met Arg Gly 
35 40 45 

Ala Ser Thr Trp Ala Gly Gly Gin Ser Gin Asp Glu Leu Met Leu Lys 
50 55 60 

Asp Glu Cys lie Leu Val Asp Val Glu Asp Asn He Thr Gly His Ala 
65 7 0 75 so 

Ser Lys Leu Glu Cys His Lys Phe Leu Pro His Gin Pro Ala Gly Leu 
85 90 95 

Leu His Arg Ala Phe Ser Val Phe Leu Phe Asp Asp Gin Gly Arg Leu 
100 105 no 

Leu Leu Gin Gin Arg Ala Arg Ser Lys He Thr Phe Pro Ser Val Trp 
115 120 125 

Thr Asn Thr Cys Cys Ser His Pro Leu His Gly Gin Thr Pro Asp Glu 
130 135 140 

Val Asp Gin Leu Ser Gin Val Ala Asp Gly Thr Val Pro Gly Ala Lys 
145 150 i 5 5 i 60 

Ala Ala Ala He Arg Lys Leu Glu His Glu Leu Gly He Pro Ala His 
165 i7o 175 

Gin Leu Pro Ala Ser Ala Phe Arg Phe Leu Thr Arg Leu His Tyr Cys 
180 185 190 

Ala Ala Asp Val Gin Pro Ala Ala Thr Gin Ser Ala Leu Trp Gly Glu 
195 200 205 

His Glu Met Asp Tyr lie Leu Phe He Arg Ala Asn Val Thr Leu Ala 
210 215 220 

Pro Asn Pro Asp Glu Val Asp Glu Val Arg Tyr Val Thr Gin Glu Glu 
225 230 235 240 

Leu Arg Gin Met Met Gin Pro Asp Asn Gly Leu Gin Trp Ser Pro Trp 
245 250 255 

Phe Arg He He Ala Ala Arg Phe Leu Glu Arg Trp Trp Ala Asp Leu 
260 265 270 

Asp Ala Ala Leu Asn Thr Asp Lys His Glu Asp Trp Gly Thr Val His 
275 2 80 285 

His He Asn Glu Ala 
290 

INFORMATION FOR SEQ ID NO : 16 : 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 284 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Ser Val Ser Ser Leu Phe Asn Leu Pro Leu lie Arg Leu Arg Ser 
1 5 10 15 

Leu Ala Leu Ser Ser Ser Phe Ser Ser Phe Arg Phe Ala His Arg Pro 
20 25 30 

Leu Ser Ser He Ser Pro Arg Lys Leu Pro Asn Phe Arg Ala Phe Ser 
35 40 45 

Gly Thr Ala Met Thr Asp Thr Lys Asp Ala Gly Met Asp Ala Val Gin 
50 55 60 

Arg Arg Leu Met Phe Glu Asp Glu Cys He Leu Val Asp Glu Thr Asp 
65 70 75 80 

Arg Val Val Gly His Val Ser Lys Tyr Asn Cys His Leu Met Glu Asn 
85 90 95 

He Glu Ala Lys Asn Leu Leu His Arg Ala Phe Ser Val Phe Leu Phe 
100 105 HO 

Asn Ser Lys Tyr Glu Leu Leu Leu Gin Gin Arg Ser Asn Thr Lys Val 
1X5 120 125 

Thr Phe Pro Leu Val Trp Thr Asn Thr Cys Cys Ser His Pro Leu Tyr 
130 135 140 

Arg Glu Ser Glu Leu He Gin Asp Asn Ala Leu Gly Val Arg Asn Ala 
145 150 155 160 

Ala Gin Arg Lys Leu Leu Asp Glu Leu Gly He Val Ala Glu Asp Val 
165 170 175 

Pro Val Asp Glu Phe Thr Pro Leu Gly Arg Met Leu Tyr Lys Ala Pro 
180 185 190 

Ser Asp Gly Lys Trp Gly Glu His Glu Leu Asp Tyr Leu Leu Phe He 
195 200 205 

Val Arg Asp Val Lys Val Gin Pro Asn Pro Asp Glu Val Ala Glu He 
210 215 220 

Lys Tyr Val Ser Arg Glu Glu Leu Lys Glu Leu Val Lys Lys Ala Asp 
225 230 235 240 
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Ala Gly Glu Glu Gly Leu Lys Leu Ser Pro Trp Phe Arg Leu Val Val 
245 250 255 

Asp Asn Phe Leu Met Lys Trp Trp Asp His Val Glu Lys Gly Thr Leu 
260 265 270 

Val Glu Ala He Asp Met Lys Thr He His Lys Leu 
275 280 

2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 287 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Ser Ser Ser Met Leu Asn Phe Thr Ala Ser Arg He Val Ser Leu 
1 5 10 is 

Pro Leu Leu Ser Ser Pro Pro Ser Arg Val His Leu Pro Leu Cys Phe 
20 25 30 

Phe Ser Pro He Ser Leu Thr Gin Arg Phe Ser Ala Lys Leu Thr Phe 
35 40 45 

Ser Ser Gin Ala Thr Thr Met Gly Glu Val Val Asp Ala Gly Met Asp 
50 55 60 

Ala Val Gin Arg Arg Leu Met Phe Glu Asp Glu Cys He Leu Val Asp 
65 70 75 80 

Glu Asn Asp Lys Val Val Gly His Glu Ser Lys Tyr Asn Cys His Leu 
85 . 90 95 

Met Glu Lys He Glu Ser Glu Asn Leu Leu His Arg Ala Phe Ser Val 
100 105 no 

Phe Leu Phe Asn Ser Lys Tyr Glu Leu Leu Leu Gin Gin Arg Ser Ala 
115 120 125 

Thr Lys Val Thr Phe Pro Leu Val Trp Thr Asn Thr Cys Cys Ser His 
130 135 140 

Pro Leu Tyr Arg Glu Ser Glu Leu He Asp Glu Asn Cys Leu Gly Val 
145 150 155 - 160 

Arg Asn Ala Ala Gin Arg Lys Leu Leu Asp Glu Leu Gly He Pro Ala 
165 170 175 
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Glu Asp Leu Pro Val Asp Gin Phe lie Pro Leu Ser Arg lie Leu Tyr 
180 185 190 

Lys Ala Pro Ser Asp Gly Lys Trp Gly Glu His Glu Leu Asp Tyr Leu 
195 200 205 

Leu Phe lie lie Arg Asp Val Asn Leu Asp Pro Asn Pro Asp Glu Val 
210 215 220 

Ala Glu Val Lys Tyr Met Asn Arg Asp Asp Leu Lys Glu Leu Leu Arg 
225 230 235 240 

Lys Ala Asp Ala Glu Glu Glu Gly Val Lys Leu Ser Pro Trp Phe Arg 
245 250 255 

Leu Val Val Asp Asn Phe Leu Phe Lys Trp Trp Asp His Val Glu Lys 
260 265 270 

Gly Ser Leu Lys Asp Ala Ala Asp Met Lys Thr lie His Lys Leu 
275 280 285 

INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 261 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Thr Gly Pro Pro Pro Arg Phe Phe Pro He Arg Ser Pro Val Pro Arg 
15 10 15 

Thr Gin Leu Phe Val Arg Ala Phe Ser Ala Val Thr Met Thr Asp Ser 
20 25 30 

Asn Asp Ala Gly Met Asp Ala Val Gin Arg Arg Leu Met Phe Glu Asp 
35 40 45 

Glu Cys He Leu Val Asp Glu Asn Asn Arg Val Val Gly His Asp Thr 
50 55 60 

Lys Tyr Asn Cys His Leu Met Glu Lys He Glu Ala Glu Asn Leu Leu 
65 70 75 80 

His Arg Ala Phe Ser Val Phe Leu Phe Asn Ser Lys Tyr Glu Leu Leu 
85 90 95 

Leu Gin Gin Arg Ser Lys Thr Lys Val Thr Phe Pro Leu Val Trp Thr 
100 105 110 
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Asn Thr Cys Cys Ser His Pro Leu Tyr Arg Glu Ser Glu Leu lie Glu 
115 120 125 

Glu Asn Val Leu Gly Val Arg Asn Ala Ala Gin Arg Lys Leu Phe Asp 
130 135 140 

Glu Leu Gly lie Val Ala Glu Asp Val Pro Val Asp Glu Phe Thr Pro 
145 150 155 160 

Leu Gly Arg Met Leu Tyr Lys Ala Pro Ser Asp Gly Lys Trp Gly Glu 
165 170 175 

His Glu Val Asp Tyr Leu Leu Phe He Val Arg Asp Val Lys Leu Gin 
160 185 190 

Pro Asn Pro Asp Glu Val Ala Glu He Lys Tyr Val Ser Arg Glu Glu 
195 200 205 

Leu Lys Glu Leu Val Lys Lys Ala Asp Ala Gly Asp Glu Ala Val Lys 
210 215 220 

Leu Ser Pro Trp Phe Arg Leu Val Val Asp Asn Phe Leu Met Lys Trp 
225 230 235 240 

Trp Asp His Val Glu Lys Gly Thr He Thr Glu Ala Ala Asp Met Lys 
245 250 255 

Thr He His Lys Leu 
260 

INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 288 amino acids 

(B) TYPE: amino acid 

<C) STRAND EDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 

Met Thr Ala Asp Asn Asn Ser Met Pro His Gly Ala Val Ser Ser Tyr 
15 10 is 

Ala Lys Leu Val Gin Asn Gin Thr Pro Glu Asp lie Leu Glu Glu Phe 
20 25 30 

Pro Glu He He Pro Leu Gin Gin Arg Pro Asn Thr Arg Ser Ser Glu 
35 40 45 

Thr Ser Asn Asp Glu Ser Gly Glu Thr Cys Phe Ser Gly His Asp Glu 
50 55 60 



SUBSTITUTE SHEET (RULE 26) 



WO 97/36998 



PCT/US97/00540 



54 

Glu Gin lie Lys Leu Met Asn Glu Asn Cya lie Val Leu Asp Trp Asp 
65 70 75 80 

Asp Asn Ala He Gly Ala Gly Thr Lys Lys Val Cys His Leu Met Glu 
85 90 95 

Asn He Glu Lys Gly Leu Leu His Arg Ala Phe Ser Val Phe He Phe 
100 105 110 

Asn Glu Gin Gly Glu Leu Leu Leu Gin Gin Arg Ala Thr Glu Lys He 
115 120 125 

Thr Phe Pro Asp Leu Trp Thr Asn Thr Cys Cys Ser His Pro Leu Cys 
130 135 140 

He Asp Asp Glu Leu Gly Leu Lys Gly Lys Leu Asp Asp Lys He Lys 
145 150 155 160 

Gly Ala He Thr Ala Ala Val Arg Lys Leu Asp His Glu Leu Gly He 
165 170 175 

Pro Glu Asp Glu Thr Lys Thr Arg Gly Lys Phe His Phe Leu Asn Arg 
160 185 190 

He His Tyr Met Ala Pro Ser Asn Glu Pro Trp Gly Glu His Glu He 
195 200 205 

Asp Tyr He Leu Phe Tyr Lys He Asn Ala Lys Glu Asn Leu Thr Val 
210 215 220 

Asn Pro Asn Val Asn Glu Val Arg Asp Phe Lys Trp Val Ser Pro Asn 
225 230 235 240 

Asp Leu Lys Thr Met Phe Ala Asp Pro Ser Tyr Lys Phe Thr Pro Trp 
245 250 255 

Phe Lys He lie Cys Glu Asn Tyr Leu Phe Asn Trp Trp Glu Gin Leu 
260 265 270 

Asp Asp Leu Ser Glu Val Glu Asn Asp Arg Gin He His Arg Met Leu 
275 280 285 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 456 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
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Met Asp Thr Leu Leu Lys Thr Pro Asn Leu Glu Phe Leu Pro His Gly 
1 5 10 15 

Phe Val Lys Ser Phe Ser Lys Phe Gly Lys Cys Glu Gly Val Cys Val 
20 25 30 

Lys Ser Ser Ala Leu Leu Glu Leu Val Pro Glu Thr Lys Lys Glu Asn 
35 40 45 

Leu Asp Phe Glu Leu Pro Met Tyr Asp Pro Ser Lys Gly Val Val Asp 
50 55 60 



Leu Ala Val Val Gly Gly Gly Pro Ala Gly Leu Ala Val Ala Gin Gin 

80 



65 70 75 



Val Ser Glu Ala Gly Leu Ser Val Cys Ser lie Asp Pro Pro Lys Leu 
85 90 95 

He Trp Pro Asn Asn Tyr Gly Val Trp Val Asp Glu Phe Glu Ala Met 
100 105 no 

Asp Leu Leu Asp Cys Leu Asp Ala Thr Trp Ser Gly Ala Val Tyr He 
H5 120 125 

Asp Asp Thr Lys Asp Leu Arg Pro Tyr Gly Arg Val Asn Arg Lys Gin 
"0 135 140 

Leu Lys Ser Lys Met Met Gin Lys Cys lie Asn Gly Val Lys Phe His 
145 150 155 leo 

Gin Ala Lys Val He Lys Val He His Glu Glu Lys Ser Met Leu He 
165 170 175 

Cys Asn Asp Gly Thr He Gin Ala Thr Val Val Leu Asp Ala Thr Gly 
180 185 . 190 

Phe Ser Arg Leu Val Gin Tyr Asp Lys Pro Tyr Asn Pro Gly Tyr Gin 
195 200 205 

Val Ala Tyr Gly He Leu Ala Glu Val Glu Glu His Pro Phe Asp Lys 
210 215 220 

Met Val Phe Met Asp Trp Arg Asp Ser His Leu Asn Asn Glu Leu Lys 
225 230 235 24 o 

Glu Arg Asn Ser He Pro Thr Phe Leu Tyr Ala Met Pro Phe Ser Ser 
245 250 255 

Asn Arg He Phe Leu Glu Glu Thr Ser Leu Val Ala Arg Pro Gly Leu 
260 265 270 

Arg Met Asp Asp He Gin Glu Arg Met Val Ala Arg Leu His Leu Gly 
275 280 285 

He Lys Val Lys Ser He Glu Glu Asp Glu His Cys Val He Pro Met 
290 295 300 
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Gly Gly Pro Leu Pro Val Leu Pro Gin Arg Val Val Gly lie Gly Gly 
305 310 315 320 

Thr Ala Gly Met Val His Pro Ser Thr Gly Tyr Met Val Ala Arg Thr 
325 330 335 

Leu Ala Ala Ala Pro Val Val Ala Asn Ala He He Tyr Leu Gly Ser 
340 345 350 

Glu Ser Ser Gly Glu Leu Ser Ala Glu Val Trp Lys Asp Leu Trp Pro 
355 360 365 

He Glu Arg Arg Arg Gin Arg Glu Phe Phe Cys Phe Gly Met Asp He 
370 375 380 

Leu Leu Lys Leu Asp Leu Pro Ala Thr Arg Arg Phe Phe Asp Ala Phe 
365 390 395 400 

Phe Asp Leu Glu Pro Arg Tyr Trp His Gly Phe Leu Ser Ser Arg Leu 
405 410 415 

Phe Leu Pro Glu Leu He Val Phe Gly Leu Ser Leu Phe Ser His Ala 
420 425 430 

Ser Asn Thr Ser Arg Glu He Met Thr Lys Gly Thr Pro Leu Val Met 
435 440 445 

He Asn Asn Leu Leu Gin Asp Glu 
450 455 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 524 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

Met Glu Cys Val Gly Ala Arg Asn Phe Ala Ala Met Ala Val Ser Thr 
15 10 15 

Phe Pro Ser Trp Ser Cys Arg Arg Lys Phe Pro Val Val Lys Arg Tyr 
20 25 30 

Ser Tyr Arg Asn He Arg Phe Gly Leu Cys Ser Val Arg Ala Ser Gly 
35 40 45 

Gly Gly Ser Ser Gly Ser Glu Ser Cys Val Ala Val Arg Glu Asp Phe 
50 55 60 
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Ala Asp Glu Glu Asp Phe Val Lys Ala Gly Gly Ser Glu He Leu Phe 
65 70 75 80 

Val Gin Met Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val 
85 90 95 

Asp Lys Leu Pro Pro He Ser He Gly Asp Gly Ala Leu Asp His Val 
100 105 no 

Val lie Gly Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala 
115 120 125 

Lys Leu Gly Leu Lys Val Gly Leu He Gly Pro Asp Leu Pro Phe Thr 
130 135 140 

Asn Asn Tyr Gly Val Trp Glu Asp Glu Phe Asn Asp Leu Gly Leu Gin 
145 150 155 160 

Lys Cys He Glu His Val Trp Arg Glu Thr He Val Tyr Leu Asp Asp 
165 170 175 

Asp Lys Pro He Thr He Gly Arg Ala Tyr Gly Arg Val Ser Arg Arg 
180 185 190 

Leu Leu His Glu Glu Leu Leu Arg Arg Cys Val Glu Ser Gly Val Ser 
195 200 205 

Tyr Leu Ser Ser Lys Val Asp Ser He Thr Glu Ala Ser Asp Gly Leu 
210 215 220 

Arg Leu Val Ala Cys Asp Asp Asn Asn Val He Pro Cys Arg Leu Ala 
225 230 235 240 

Thr Val Ala Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val 
245 250 255 

Gly Gly Pro Arg Val Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu 
260 265 270 

Val Glu Asn Ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr 
275 280 285 

Arg Asp Tyr Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro 
290 295 300 

Thr Phe Leu Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu 
305 310 315 320 

Glu Thr Cys Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys 
325 330 335 

Thr Lys Leu Met Leu Arg Leu Asp Thr Leu Gly He Arg He Leu Lys 
340 345 350 

Thr Tyr Glu Glu Glu Trp Ser Tyr He Pro Val Gly Gly Ser Leu Pro 
355 360 365 
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Asn Thr Glu Gin Lys Asn Ldu Ala PhejGly Ala Ala; Ala Ser Met Val 



370 375 1 380 



His Pro Ala Thr Gly Tyr Ser Val Val Arg Ser Leu Ser Glu Ala Pro 
385 390 395 400 

Lys Tyr Ala Ser Val lie Ala Glu He Leu Arg Glu Glu Thr Thr Lys 
405 410 415 

Gin He Asn Ser Asn He Ser Arg Gin Ala Trp Asp Thr Leu Trp Pro 
420 425 430 

Pro Glu Arg Lys Arg Gin Arg Ala Phe Phe Leu Phe Gly Leu Ala Leu 
435 440 445 

He Val Gin Phe Asp Thr Glu Gly He Arg Ser Phe Phe Arg Thr Phe 
450 455 460 

Phe Arg Leu Pro Lys Trp Met Trp Gin Gly Phe Leu Gly Ser Thr Leu 
465 470 475 480 

Thr Ser Gly Asp Leu Val Leu Phe Ala Leu Tyr Met Phe Val He Ser 
485 490 495 

Pro Asn Asn Leu Arg Lys Gly Leu He Asn His Leu He Ser Asp Pro 
500 505 510 

Thr Gly Ala Thr Met He Lys Thr Tyr Leu Lys Val 
515 520 
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Claims 

1. An isolated eukaryotic enzyme having the amino acid 
sequence of SEQ ID NO: 2, 4, 14, 15, 16 or 18. 

2. An isolated eukaryotic enzyme of Claim 1 which is a e 
cyclase enzyme having the amino acid sequence of SEQ ID NO: 2. 

3 . An isolated DNA sequence comprising a gene encoding 
the eukaryotic e cyclase of Claim 2. 

4. The isolated DNA sequence according to Claim 3, 
having the nucleic acid sequence of SEQ ID NO: 1. 

5. An expression vector comprising the DNA sequence of 
Claim 3 . 

6. The expression vector according to Claim 5 which is 
pATeps deposited with the American Type Culture Collection on 
March 4, 1996 under accession number 98005. 

7. A host containing the expression vector of Claim 5. 

8. A host containing the expression vector of Claim 6. 

9. An isolated eukaryotic enzyme of Claim 1, which is an 
isopentenyl isomerase (IPP) enzyme having the amino acid 
sequence of SEQ ID NOS: 14, 15, 16 or 18. 

10. An isolated DNA sequence comprising a gene encoding 
the IPP enzyme of Claim 9. 

11. The isolated DNA sequence of Claim 10, having the 
nucleic acid sequence of SEQ ID NOS: 9, 10, 11 or 12 . 

12. An expression vector comprising the DNA sequence of 
Claim 10. 
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13. The expression vector of Claim 11 which is pHP05, 
pMDPl, pATDP7 or pHP04, deposited with the American Type 
Culture Collection on March 4, 1996 under accession Nos. 
98000, 98001, 98002 or 98004. 

14. A host containing the expression vector of Claim 12 . 

15. The isolated eukaryotic enzyme of Claim l f which is 
0-carotene hydroxylase enzyme having the amino acid sequence 
of SEQ ID NO: 4. 

16. An isolated DNA sequence comprising a gene encoding 
the /3-carotene hydroxylase enzyme of Claim 15. 

17. The isolated DNA sequence according to Claim 16 , 
having the nucleic acid sequence of SEQ ID NO: 3. 

18. An expression vector comprising the DNA sequence of 
Claim 16. 

19. The expression vector according to Claim 18 which is 
pATOHB deposited with the American Type Culture Collection on 
March 4, 1996 under accession number 98003. 

20. A host containing the expression vector of Claim 18. 

21. A host containing the expression vector of Claim 19. 

22. A DNA sequence which, when incorporated into a 
prokaryotic host, results in the expression of an eukaryotic 
carotenoid biosynthetic enzyme, 

wherein said DNA sequence comprises a truncated portion 
of the naturally occurring DNA sequence encoding said 
eukaryotic carotenoid biosynthetic enzyme, wherein said 
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truncated portion comprises said natural sequence minus at 
least one codon at the 5' terminus. 

23. The DNA sequence of Claim 22, wherein said eukayotic 
carotenoid biosynthetic enzyme is ^-carotene hydroxylase. 

24. The DNA sequence of Claim 23, which is a Balll - 3' 
end exofragment of SEQ ID NO: 3 fused to a 5' ATG start codon. 

25. A method for screening for eukaryotic genes involved 
in carotenoid biosynthesis, metabolism or degradation 
comprising the steps of: 

engineering of a prokaryotic host which accumulates a 
carotenoid or carotenoid precursor or which is deficient in an 
enzyme of the carotenoid pathway; 

transforming said host with DNA which may contain an 
eukaryotic carotenoid biosynthetic gene; 

culturing said transformed host to obtain colonies; and 

screening for colonies exhibiting a different visual 
appearance than colonies of the untransformed host. 

26. The method of Claim 25, wherein said prokaryotic host 
is E. coli. 

27. A method for producing a carotenoid, comprising the 
steps of: 

transforming a host with DNA which comprises a eukaryotic 
carotenoid biosynthetic gene; 

culturing said host for a time sufficient for said host 
"to produce said carotenoid; and 

collecting said carotenoid from the host. 
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28. The method of Claim 26, wherein said DNA further 
comprises a isopentyl pyrophospate isomerase gene. 

29. A method for inhibiting carotenoid biosynthesis in a 
host, comprising the steps of: 

transforming said host with antisense DNA to a eukaryotic 
carotenoid biosynthesis gene; and 
culturing said host. 

30. A method for increasing production of a secondary 
metabolite of isopentyl pyrophosphate (IPP) by a host, 
comprising the steps of: 

transforming said host with DNA that comprises an 
isopentyl pyrophosphate isomerase gene; and 

culturing said host for a time sufficient to produce said 
secondary metabolite; and 

recovering said secondary metabolite from said host. 

31. The method of Claim 30, wherein said secondary 
metabolite is a carotenoid. 

32. A method for screening for secondary metabolites, 

comprising: 

engineering a host which accumulates a secondary 
metabolite or secondary metabolite precursor of isopentyl 
pyrophosphate (IPP) ; and 

transforming said host with DNA that may contain an IPP 
isomerase gene; and 

culturing said host for a time sufficient to accumulate 
said secondary metabolite or precursor; and 

screening for said secondary metabolite or precursor. 
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Rule 13.2 so as to form a single inventive concept. 



Form PCT/IS A/210 (extra sheet)(July 1992)* 



i 



